Machine Learning of Syntactic Attachment from Morphosyntactic and Semantic Co-occurrence Statistics

نویسندگان

  • Szymon Acedanski
  • Adam Slaski
  • Adam Przepiórkowski
چکیده

The paper presents a novel approach to extracting dependency information in morphologically rich languages using co-occurrence statistics based not only on lexical forms (as in previously described collocation-based methods), but also on morphosyntactic and wordnet-derived semantic properties of words. Statistics generated from a corpus annotated only at the morphosyntactic level are used as features in a Machine Learning classifier which is able to detect which heads of groups found by a shallow parser are likely to be connected by an edge in the complete parse tree. The approach reaches the precision of 89% and the recall of 65%, with an extra 6% recall, if only words present in the wordnet are considered.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی خودکار نقش‌های معنایی در جملات فارسی به کمک درخت‌های وابستگی

Automatic identification of words with semantic roles (such as Agent, Patient, Source, etc.) in sentences and attaching correct semantic roles to them, may lead to improvement in many natural language processing tasks including information extraction, question answering, text summarization and machine translation. Semantic role labeling systems usually take advantage of syntactic parsing and th...

متن کامل

رشد جنبه معنایی فعل در کودک فارسی‌زبان: مطالعه طولی

Objective Learning “verb” as one of the main components of sentence, has been always a debatable topics in the process of language learning. One of the important issues in “verb” learning is determining its meaning using syntactic clues and learning its semantic aspects. Therefore, the main objective of this study was to examine the development of the semantic aspect of ...

متن کامل

Tagging for Learning: Collecting Thematic Relations from Corpus

Recent work in text analysis has suggested that da ta on words tha t frequently occur together reveal important information about text content. Co-occurrence relations can serve two main purposes in language processing. First, the statistics of co-occurrence have been shown to produce accurate results in syntactic analysis. Second, the way that words appear together can help in assigning themat...

متن کامل

Probabilistic Latent Semantic Analysis

Probabilistic Latent Semantic Analysis is a novel statistical technique for the analysis of two{mode and co-occurrence data, which has applications in information retrieval and ltering, natural language processing, machine learning from text, and in related areas. Compared to standard Latent Semantic Analysis which stems from linear algebra and performs a Singular Value Decomposition of co-occu...

متن کامل

A Learning Based Model for Chinese Co-reference Resolution by Mining Contextual Evidence

This paper presents a learning based model for Chinese co-reference resolution, in which diverse contextual features are explored inspired by related linguistic theory. Our main motivation is to try to boost the co-reference resolution performance only by leveraging multiple shallow syntactic and semantic features, which can escape from tough problems such as deep syntactic and semantic structu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012